Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity

نویسندگان

Nils Reimers

Philip Beyer

Iryna Gurevych

چکیده

Semantic Textual Similarity (STS) is a foundational NLP task and can be used in a wide range of tasks. To determine the STS of two texts, hundreds of different STS systems exist, however, for an NLP system designer, it is hard to decide which system is the best one. To answer this question, an intrinsic evaluation of the STS systems is conducted by comparing the output of the system to human judgments on semantic similarity. The comparison is usually done using Pearson correlation. In this work, we show that relying on intrinsic evaluations with Pearson correlation can be misleading. In three common STS based tasks we could observe that the Pearson correlation was especially ill-suited to detect the best STS system for the task and other evaluation measures were much better suited. In this work we define how the validity of an intrinsic evaluation can be assessed and compare different intrinsic evaluation methods. Understanding of the properties of the targeted task is crucial and we propose a framework for conducting the intrinsic evaluation which takes the properties of the targeted task into account.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from se...

متن کامل

Learning the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity

We present a work to evaluate the hypothesis that automatic evaluation metrics developed for Machine Translation (MT) systems have significant impact on predicting semantic similarity scores in Semantic Textual Similarity (STS) task for English, in light of their usage for paraphrase identification. We show that different metrics may have different behaviors and significance along the semantic ...

متن کامل

Análise de Medidas de Similaridade Semântica na Tarefa de Reconhecimento de Implicação Textual (Analysis of Semantic Similarity Measures in the Recognition of Textual Entailment Task)[In Portuguese]

In this work, we present a feature-based approach to the RTE (Recognizing Text Entailment) task that verifies the similarity between two sentences including syntactic and semantic aspects. The selected features come from the winning work of the RTE task of the workshop ASSIN (Semantic Similarity Evaluation and Textual Inference) with some changes and addition of other semantic feature. The eval...

متن کامل

UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment

This paper presents the system submitted by University of Wolverhampton for SemEval-2014 task 1. We proposed a machine learning approach which is based on features extracted using Typed Dependencies, Paraphrasing, Machine Translation evaluation metrics, Quality Estimation metrics and Corpus Pattern Analysis. Our system performed satisfactorily and obtained 0.711 Pearson correlation for the sema...

متن کامل

Intrinsic and extrinsic approaches to recognizing textual entailment

Recognizing Textual Entailment (RTE) is to detect an important relation between two texts, namely whether one text can be inferred from the other. For natural language processing, especially for natural language understanding, this is a useful and challenging task. We start with an introduction of the notion of textual entailment, and then define the scope of the recognition task. We summarize ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity

نویسندگان

چکیده

منابع مشابه

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Learning the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity

Análise de Medidas de Similaridade Semântica na Tarefa de Reconhecimento de Implicação Textual (Analysis of Semantic Similarity Measures in the Recognition of Textual Entailment Task)[In Portuguese]

UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment

Intrinsic and extrinsic approaches to recognizing textual entailment

عنوان ژورنال:

اشتراک گذاری